Top-k Ranked Document Search in General Text Databases

نویسندگان

J. Shane Culpepper

Gonzalo Navarro

Simon J. Puglisi

Andrew Turpin

چکیده

Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new search domains (DNA, multimedia, OCR texts, Far East languages) there is often no obvious definition of words and traditional indexing approaches are not so easily adapted, or break down entirely. We present two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text. We build on existing theoretical techniques, which we have implemented and compared empirically with new approaches introduced in this paper. Our best approach is significantly faster than existing methods in RAM, and is even three times faster than a state-of-the-art inverted file implementation for English text when word queries are issued.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Search in Text Cube: Finding Top-k Relevant Cells

We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell...

متن کامل

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

WITH the prevalence of Web search engines, keyword search has become the most popular way for users to retrieve information from text documents. On the other hand, there is an enormous amount of valuable information stored in structured form (relational or semistructured) in Internet, intranet, and enterprise databases. To query such data sources, users traditionally depended on specialized app...

متن کامل

Overview of Top-k Query Processing in Relational Databases

Query processing is a fundamental part of Database management system. As the amount of text data stored in relational databases is increasing, it is necessary to support the Top-k query processing over text data. The main objective of top-k query processing is to return the k highest ranked results quickly and efficiently. In this paper, we introduce the Top-k query processing in relational dat...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conference...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Top-k Ranked Document Search in General Text Databases

نویسندگان

چکیده

منابع مشابه

Keyword Search in Text Cube: Finding Top-k Relevant Cells

Guest Editors Introduction: Special Section on Keyword Search on Structured Data

Overview of Top-k Query Processing in Relational Databases

Text Summarization Using Cuckoo Search Optimization Algorithm

TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

عنوان ژورنال:

اشتراک گذاری